Unifying Decision Trees Split Criteria Using Tsallis Entropy
نویسندگان
چکیده
The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. Most of them, however, are greedy algorithms which have the drawback of obtaining only local optimums. Besides, common split criteria, e.g. Shannon entropy, Gain Ratio and Gini index, are also not flexible due to lack of adjustable parameters on data sets. To address the above issues, we propose a series of novel methods using Tsallis entropy in this paper. Firstly, a Tsallis Entropy Criterion (TEC) algorithm is proposed to unify Shannon entropy, Gain Ratio and Gini index, which generalizes the split criteria of decision trees. Secondly, we propose a Tsallis Entropy Information Metric (TEIM) algorithm for efficient construction of decision trees. The TEIM algorithm takes advantages of the adaptability of Tsallis conditional entropy and the reducing greediness ability of two-stage approach. Experimental results on UCI data sets indicate that the TEC algorithm achieves statistically significant improvement over the classical algorithms, and that the TEIM algorithm yields significantly better decision trees in both classification accuracy and tree complexity.
منابع مشابه
A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification
The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct nearoptimal decision trees. Most of them, however, are greedy algorithms that have the drawback of obtaining only local optimums. Besides, conventional split criteria they used, e.g. Shannon ...
متن کاملComparison of Shannon, Renyi and Tsallis Entropy Used in Decision Trees
Shannon entropy used in standard top-down decision trees does not guarantee the best generalization. Split criteria based on generalized entropies offer different compromise between purity of nodes and overall information gain. Modified C4.5 decision trees based on Tsallis and Renyi entropies have been tested on several high-dimensional microarray datasets with interesting results. This approac...
متن کاملTsallis Entropy and Conditional Tsallis Entropy of Fuzzy Partitions
The purpose of this study is to define the concepts of Tsallis entropy and conditional Tsallis entropy of fuzzy partitions and to obtain some results concerning this kind entropy. We show that the Tsallis entropy of fuzzy partitions has the subadditivity and concavity properties. We study this information measure under the refinement and zero mode subset relations. We check the chain rules for ...
متن کاملTsallis Entropy Theory for Modeling in Water Engineering: A Review
Water engineering is an amalgam of engineering (e.g., hydraulics, hydrology, irrigation, ecosystems, environment, water resources) and non-engineering (e.g., social, economic, political) aspects that are needed for planning, designing and managing water systems. These aspects and the associated issues have been dealt with in the literature using different techniques that are based on different ...
متن کاملApplication of Different Methods of Decision Tree Algorithm for Mapping Rangeland Using Satellite Imagery (Case Study: Doviraj Catchment in Ilam Province)
Using satellite imagery for the study of Earth's resources is attended by manyresearchers. In fact, the various phenomena have different spectral response inelectromagnetic radiation. One major application of satellite data is the classification ofland cover. In recent years, a number of classification algorithms have been developed forclassification of remote sensing data. One of the most nota...
متن کامل